1. Field of the Invention
A conventional OCR which converts an image of a typewritten document serving as a paper form or an electronic form into a character code format which can be edited by a computer is known. Such a conventional OCR is a very useful apparatus which, when information of a handwritten form or the like is input to a computer, saves the effort of re-inputting the information in the computer.
2. Description of the Related Art
However, in the conventional OCR, information of a form or the like formed by handwriting having a peculiar style or formed by a computer cannot be correctly read. Therefore, various techniques which correctly read various pieces of information to convert the information into character codes or the like have been disclosed.
For example, Japanese Patent Application Laid-Open No. 2000-285190 discloses a data input apparatus such as an OCR which extracts a similar registered form from a predetermined registered form by using pieces of information such as ruled lines, characters, and colors extracted from an input form to extract recognized information in a defined region on the extracted registered form.
However, in the conventional technique, when the input form even partially changes (for example, a new fiscal year or a color change), an appropriate registered form cannot be extracted. This is disadvantageous because the recognized information cannot be extracted from the input form. When the input form even partially changes, a user must register a new input form each time the input form changes. The registering operation is cumbersome, and the user disadvantageously bears a heavy burden. In addition, since the contents to be registered is not relevant to data extracted from the form, a form registering operation and designation of a data region must be independently performed.